Santiago de Cuba
Act Now: A Novel Online Forecasting Framework for Large-Scale Streaming Data
Liang, Daojun, Zhang, Haixia, Wang, Jing, Yuan, Dongfeng, Zhang, Minggao
In this paper, we find that existing online forecasting methods have the following issues: 1) They do not consider the update frequency of streaming data and directly use labels (future signals) to update the model, leading to information leakage. 2) Eliminating information leakage can exacerbate concept drift and online parameter updates can damage prediction accuracy. 3) Leaving out a validation set cuts off the model's continued learning. 4) Existing GPU devices cannot support online learning of large-scale streaming data. To address the above issues, we propose a novel online learning framework, Act-Now, to improve the online prediction on large-scale streaming data. Firstly, we introduce a Random Subgraph Sampling (RSS) algorithm designed to enable efficient model training. Then, we design a Fast Stream Buffer (FSB) and a Slow Stream Buffer (SSB) to update the model online. FSB updates the model immediately with the consistent pseudo- and partial labels to avoid information leakage. SSB updates the model in parallel using complete labels from earlier times. Further, to address concept drift, we propose a Label Decomposition model (Lade) with statistical and normalization flows. Lade forecasts both the statistical variations and the normalized future values of the data, integrating them through a combiner to produce the final predictions. Finally, we propose to perform online updates on the validation set to ensure the consistency of model learning on streaming data. Extensive experiments demonstrate that the proposed Act-Now framework performs well on large-scale streaming data, with an average 28.4% and 19.5% performance improvement, respectively. Experiments can be reproduced via https://github.com/Anoise/Act-Now.
DistPred: A Distribution-Free Probabilistic Inference Method for Regression and Forecasting
Liang, Daojun, Zhang, Haixia, Yuan, Dongfeng
Traditional regression and prediction tasks often only provide deterministic point estimates. To estimate the uncertainty or distribution information of the response variable, methods such as Bayesian inference, model ensembling, or MC Dropout are typically used. These methods either assume that the posterior distribution of samples follows a Gaussian process or require thousands of forward passes for sample generation. We propose a novel approach called DistPred for regression and forecasting tasks, which overcomes the limitations of existing methods while remaining simple and powerful. Specifically, we transform proper scoring rules that measure the discrepancy between the predicted distribution and the target distribution into a differentiable discrete form and use it as a loss function to train the model end-to-end. This allows the model to sample numerous samples in a single forward pass to estimate the potential distribution of the response variable. We have compared our method with several existing approaches on multiple datasets and achieved state-of-the-art performance. Additionally, our method significantly improves computational efficiency. For example, compared to state-of-the-art models, DistPred has a 90x faster inference speed. Experimental results can be reproduced through https://github.com/Anoise/DistPred.
Minusformer: Improving Time Series Forecasting by Progressively Learning Residuals
Liang, Daojun, Zhang, Haixia, Yuan, Dongfeng, Zhang, Bingzheng, Zhang, Minggao
In this paper, we find that ubiquitous time series (TS) forecasting models are prone to severe overfitting. To cope with this problem, we embrace a de-redundancy approach to progressively reinstate the intrinsic values of TS for future intervals. Specifically, we renovate the vanilla Transformer by reorienting the information aggregation mechanism from addition to subtraction. Then, we incorporate an auxiliary output branch into each block of the original model to construct a highway leading to the ultimate prediction. The output of subsequent modules in this branch will subtract the previously learned results, enabling the model to learn the residuals of the supervision signal, layer by layer. This designing facilitates the learning-driven implicit progressive decomposition of the input and output streams, empowering the model with heightened versatility, interpretability, and resilience against overfitting. Since all aggregations in the model are minus signs, which is called Minusformer. Extensive experiments demonstrate the proposed method outperform existing state-of-the-art methods, yielding an average performance improvement of 11.9% across various datasets.
Does Long-Term Series Forecasting Need Complex Attention and Extra Long Inputs?
Liang, Daojun, Zhang, Haixia, Yuan, Dongfeng, Ma, Xiaoyan, Li, Dongyang, Zhang, Minggao
Does Long-Term Series Forecasting Need Complex Attention and Extra Long Inputs? Abstract--As Transformer-based models have achieved impressive performance on various time series tasks, Long-Term Series Forecasting (LTSF) tasks have also received extensive attention in recent years. However, due to the inherent computational complexity and long sequences demanding of Transformer-based methods, its application on LTSF tasks still has two major issues that need to be further investigated: 1) Whether the sparse attention mechanism designed by these methods actually reduce the running time on real devices; 2) Whether these models need extra long input sequences to guarantee their performance? The answers given in this paper are negative. Meanwhile, a gating mechanism is embedded into Periodformer to regulate the influence of the attention module on the prediction results. This enables Periodformer to have much more powerful and flexible sequence modeling capability with linear computational complexity, which guarantees higher prediction performance and shorter runtime on real devices. Furthermore, to take full advantage of GPUs for fast hyperparameter optimization (e.g., finding the suitable input length), a Multi-GPU Asynchronous parallel algorithm based on Bayesian Optimization (MABO) is presented. MABO allocates a process to each GPU via a queue mechanism, and then creates multiple trials at a time for asynchronous parallel search, which greatly reduces the search time. Experimental results show that Periodformer consistently achieves the best performance on six widely used benchmark datasets.
Some recent advances in reasoning based on analogical proportions
Bounhas, Myriam, Prade, Henri, Richard, Gilles
Analogical proportions (AP) are statements of the form "a is to b ascis to d". They compare the pairs of items(a,b) and(c, d) in terms of their differences and similarities. The explicit use of APs in analogical reasoning has contributed to a renewal of its applications, leading to many developments, especially in the last decade; see [30] for a survey. However, even if much has been already done both at the theoretical and at the practical levels, the very nature of APs may not yet be fully understood and their full potential explored. In the following, we survey recent works on APs along three directions: their role in classification tasks [4]; their use for providing explanations [20]; their relation with multi-valued dependencies [21]. This just intends to be an introductory paper, and the reader is referred to the above references for more details on each issue.
What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?
Cao, Yang Trista, Seelman, Kyle, Lee, Kyungjun, Daumé, Hal III
In visual question answering (VQA), a machine must answer a question given an associated image. Recently, accessibility researchers have explored whether VQA can be deployed in a real-world setting where users with visual impairments learn about their environment by capturing their visual surroundings and asking questions. However, most of the existing benchmarking datasets for VQA focus on machine "understanding" and it remains unclear how progress on those datasets corresponds to improvements in this real-world use case. We aim to answer this question by evaluating discrepancies between machine "understanding" datasets (VQA-v2) and accessibility datasets (VizWiz) by evaluating a variety of VQA models. Based on our findings, we discuss opportunities and challenges in VQA for accessibility and suggest directions for future work.
Guiding Symbolic Natural Language Grammar Induction via Transformer-Based Sequence Probabilities
Goertzel, Ben, Madrigal, Andres Suarez, Yu, Gino
A novel approach to automated learning of syntactic rules governing natural languages is proposed, based on using probabilities assigned to sentences (and potentially longer word sequences) by transformer neural network language models to guide symbolic learning processes like clustering and rule induction. This method exploits the learned linguistic knowledge in transformers, without any reference to their inner representations; hence, the technique is readily adaptable to the continuous appearance of more powerful language models. We show a proof-of-concept example of our proposed technique, using it to guide unsupervised symbolic link-grammar induction methods drawn from our prior research.
Knowledge Graphs
Hogan, Aidan, Blomqvist, Eva, Cochez, Michael, d'Amato, Claudia, de Melo, Gerard, Gutierrez, Claudio, Gayo, José Emilio Labra, Kirrane, Sabrina, Neumaier, Sebastian, Polleres, Axel, Navigli, Roberto, Ngomo, Axel-Cyrille Ngonga, Rashid, Sabbir M., Rula, Anisa, Schmelzeisen, Lukas, Sequeda, Juan, Staab, Steffen, Zimmermann, Antoine
In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After a general introduction, we motivate and contrast various graph-based data models and query languages that are used for knowledge graphs. We discuss the roles of schema, identity, and context in knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We summarise methods for the creation, enrichment, quality assessment, refinement, and publication of knowledge graphs. We provide an overview of prominent open knowledge graphs and enterprise knowledge graphs, their applications, and how they use the aforementioned techniques. We conclude with high-level future research directions for knowledge graphs.
Russia launches facial recognition programme to find anyone's face on Twitter
A Russian company has launched a programme that can identify a stranger among 300 million Twitter users in less than a second. The social media platform has responded to the new software, called "FindFace", saying it its use is in "violation" of its rules and it is taking the matter "very seriously". Trump'obviously aware' Russia behind election hacks, White House says Syria's Assad says Donald Trump will be Russia's'natural ally' Trump'obviously aware' Russia behind election hacks, White House says Syria's Assad says Donald Trump will be Russia's'natural ally' "We see lots of opportunities for Twitter users on the service," Artem Kukharenko, co-founder of NTechLab told BuzzFeed. "We think this is something many people will use," he added, claiming the technology could be used to reduce spam profiles. "Not in the US, but in other countries there is a real problem of politicians, reporters, finding that someone created a fake account for them. "I was involved back in Russia with scandals with a fake account posing as a politicians that tweeted something and created political scandal." he said. Christopher Weatherhead, Technologist at Privacy International said: "The software created by NTechLab highlights the ease to which cross-referencing profiles photos is possible.
Analogical Dissimilarity: Definition, Algorithms and Two Experiments in Machine Learning
Miclet, L., Bayoudh, S., Delhay, A.
This paper defines the notion of analogical dissimilarity between four objects, with a special focus on objects structured as sequences. Firstly, it studies the case where the four objects have a null analogical dissimilarity, i.e. are in analogical proportion. Secondly, when one of these objects is unknown, it gives algorithms to compute it. Thirdly, it tackles the problem of defining analogical dissimilarity, which is a measure of how far four objects are from being in analogical proportion. In particular, when objects are sequences, it gives a definition and an algorithm based on an optimal alignment of the four sequences. It gives also learning algorithms, i.e. methods to find the triple of objects in a learning sample which has the least analogical dissimilarity with a given object. Two practical experiments are described: the first is a classification problem on benchmarks of binary and nominal data, the second shows how the generation of sequences by solving analogical equations enables a handwritten character recognition system to rapidly be adapted to a new writer.